A Lightweight Distributed Order and Duplication Insensitive Algorithm for Approximate Top-k Queries using Order Statistics
نویسندگان
چکیده
1. APPROXIMATE TOP-K Let {e1, e2, . . . , el} be a set of distinct records in a database, with unique IDs {id1, id2, . . . , idl}. Let A1, A2, . . . , Ap be a set of distinct attributes for each record. For every record ei, the attribute Aj is zero or some positive value. We denote the value of the attribute Aj of record ei by Aj(ei). The sum of the attributes of ei is denoted by Ni = ∑ j Aj(ei). We would like to obtain the list of top k records, ordered by Ni. We present a highly configurable, lightweight, distributed algorithm to solve the above problem approximately, based on order statistics.
منابع مشابه
Unified Framework for Top-k Query Processing in Peer-to-Peer Networks
Supporting queries over dispersed data stored in large-scale distributed systems, such as peer-to-peer networks, naturally calls for ranked retrieval in order to effectively focus on the most relevant (i.e., top-k) results. While top-k retrieval has been actively studied lately, existing algorithms are too restrictive due to their assumptions about how the data is partitioned amongst the variou...
متن کاملPay-as-you-go Approximate Join Top-k Processing for the Web of Data Technical Report
For effectively searching the Web of data, ranking of results is a crucial. Top-k processing strategies have been proposed to allow an efficient processing of such ranked queries. Top-k strategies aim at computing k top-ranked results without complete result materialization. However, for many applications result computation time is much more important than result accuracy and completeness. Thus...
متن کاملPay-as-you-go Approximate Join Top-k Processing for the Web of Data
For effectively searching the Web of data, ranking of results is a crucial. Top-k processing strategies have been proposed to allow an efficient processing of such ranked queries. Top-k strategies aim at computing k top-ranked results without complete result materialization. However, for many applications result computation time is much more important than result accuracy and completeness. Thus...
متن کاملTop-k Query Evaluation with Probabilistic Guarantees
Martin Theobald, Gerhard Weikum, Ralf Schenkel Max-Planck Institute of Computer Science D-66123 Saarbruecken, Germany {mtb, weikum, schenkel}@mpi-sb.mpg.de Abstract Top-k queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known general-purpose algorithm for evaluating top-k queries is Fagin’s thresho...
متن کاملFinding Top-k Approximate Answers to Path Queries
We consider the problem of finding and ranking paths in semistructured data without necessarily knowing its full structure. The query language we adopt comprises conjunctions of regular path queries, allowing path variables to appear in the bodies and the heads of rules, so that paths can be returned to the user. We propose an approximate query matching semantics which adapts standard notions o...
متن کامل